Skip to content

feat(plugin-hitl): add human-in-the-loop tool approval flow#23

Draft
exoticknight wants to merge 3 commits into
mainfrom
codex/plugin-hitl
Draft

feat(plugin-hitl): add human-in-the-loop tool approval flow#23
exoticknight wants to merge 3 commits into
mainfrom
codex/plugin-hitl

Conversation

@exoticknight
Copy link
Copy Markdown
Member

@exoticknight exoticknight commented May 27, 2026

IMPOTENT: rely on moeru-ai/xsai#309

feat

  • Add core resolveToolCallHooks support that bridges plugin preToolCall / postToolCall hooks into xsAI response execution.
  • Add tool-interruption Apeira events and exported HITL hook/interruption types for host UIs.
  • Add per-plugin private session state that is snapshotted, restored, forked, and passed into plugin hooks.
  • Introduce @apeira/plugin-hitl with allow/ask/deny/off modes, policy callbacks, pending request tracking, call/run/conversation approval scopes, model-visible rejection results, and optional tool metadata.
  • Export @apeira/plugin-hitl through the umbrella apeira package.
  • Teach @apeira/plugin-ag-ui to map HITL interruptions into HITL_REVIEW_REQUIRED tool results for review-card rendering.
  • Add deterministic HITL demo flows for CopilotKit and pi-tui, including replay fetch helpers, mock demo tools, approval/rejection paths, CopilotKit review cards, and pi-tui /approve, /reject, /demo, and /hitl commands.
  • Add a Chat SDK Telegram example that streams Apeira agent text deltas into chat threads with in-memory Chat SDK state and persisted Apeira sessions.

refactor

  • Rename the core integration from tool-call control decisions to tool-call transformer hooks, matching the xsAI preToolCall / postToolCall API.
  • Share HITL demo runtime helpers between examples under examples/shared/hitl-demo.ts.
  • Extract pi-tui transcript append and HITL demo transcript helpers into hitl-demo-transcript.ts.

chore

  • Add Vitest coverage for the HITL plugin, core plugin private state, AG-UI interruption mapping, CopilotKit HITL runtime/demo behavior, and pi-tui demo helpers.
  • Add package/workspace wiring for @apeira/plugin-hitl, Chat SDK dependencies, example scripts, lockfile updates, and CopilotKit HITL env prefix support.
  • Bump xsAI catalog packages to the HITL beta build required by tool-call hooks.
  • Add the Chat SDK skill metadata and ignore .apeira example session storage.
  • Add common-tools planning artifacts under docs/plans for approval policy and roadmap follow-up work.

review aids

Core Hook Flow

sequenceDiagram
  participant Model as xsAI responses
  participant Core as Apeira core
  participant Plugin as Plugin hooks
  participant HITL as @apeira/plugin-hitl
  participant Host as UI / TUI host
  participant Tool as Tool executor

  Model->>Core: function_call
  Core->>Plugin: preToolCall(toolCall, executeOptions)
  Plugin->>HITL: inspect tool key + policy

  alt allowed by mode/policy/scope
    HITL-->>Plugin: void
    Plugin-->>Core: void or transformed toolCall
    Core->>Tool: execute(args)
    Tool-->>Core: tool result
    Core->>Plugin: postToolCall(toolResult)
    Core-->>Model: function_call_output
  else denied
    HITL-->>Plugin: synthetic CompletionToolResult
    Plugin-->>Core: TOOL_HITL_REJECTED result
    Core-->>Model: function_call_output
  else human approval required
    HITL-->>Host: tool-interruption event
    HITL-->>Plugin: pending promise
    Host->>HITL: approve(id, scope) or reject(id)
    HITL-->>Plugin: continue or synthetic rejection result
    Plugin-->>Core: resolved hook result
  end
Loading

Approval Scope Model

flowchart TD
  ToolCall["Tool call key = toolName + args"] --> Mode{HITL mode}
  Mode -->|allow/off| Continue[Continue tool execution]
  Mode -->|deny| Reject[Return synthetic rejection result]
  Mode -->|ask| Existing{Existing approval?}

  Existing -->|conversation private state| Continue
  Existing -->|run allow set| Continue
  Existing -->|none| Interrupt[Emit tool-interruption]

  Interrupt --> Human{Human decision}
  Human -->|approve call| Continue
  Human -->|approve run| RunAllow[Remember key until turn done]
  Human -->|approve conversation| ConversationAllow[Persist key in plugin private state]
  Human -->|reject| Reject

  RunAllow --> Continue
  ConversationAllow --> Continue
  RunAllow --> TurnDone[onTurnDone clears run approvals]
Loading

Package Wiring

flowchart LR
  Core["@apeira/core"]
  HITL["@apeira/plugin-hitl"]
  AGUI["@apeira/plugin-ag-ui"]
  Apeira["@apeira/apeira"]
  Shared["examples/shared/hitl-demo.ts"]
  Copilot["examples/copilotkit"]
  PiTui["examples/pi-tui"]
  ChatSDK["examples/chat-sdk"]

  Core -->|ToolCallHooks + privateState| HITL
  Core -->|tool-interruption event| AGUI
  HITL -->|umbrella export| Apeira
  HITL --> Copilot
  HITL --> PiTui
  AGUI -->|HITL_REVIEW_REQUIRED result| Copilot
  Shared --> Copilot
  Shared --> PiTui
  Core --> ChatSDK
Loading

Example Runtime Flows

flowchart TD
  DemoStart["User starts HITL demo"] --> Replay["createHitlReplayFetch returns deterministic response"]
  Replay --> FunctionCall["Assistant emits function_call"]
  FunctionCall --> Hook["Apeira runs preToolCall hook"]
  Hook --> Review["plugin-hitl emits tool-interruption"]
  Review --> HostUI{"Host surface"}
  HostUI -->|CopilotKit| ReviewCard["Render review card"]
  HostUI -->|pi-tui| SlashCommands["Show /approve and /reject commands"]
  ReviewCard --> Decision["approve/reject controller call"]
  SlashCommands --> Decision
  Decision --> Resume["Pending preToolCall promise resolves"]
  Resume --> ExecuteOrReject{"Approved?"}
  ExecuteOrReject -->|yes| ToolExec["Execute mock tool"]
  ExecuteOrReject -->|no| Rejection["Return TOOL_HITL_REJECTED result"]
Loading

Suggested Review Order

Step Focus Files
1 Core public contract and hook bridging packages/core/src/types/plugin.ts, packages/core/src/types/event.ts, packages/core/src/utils/turn-runner.ts
2 Plugin private session state semantics packages/core/src/utils/agent-runtime.ts, packages/core/src/utils/agent.ts, packages/core/test/utils/agent.test.ts
3 HITL approval/rejection behavior packages/plugin-hitl/src/index.ts, packages/plugin-hitl/test/index.test.ts, packages/plugin-hitl/README.md
4 AG-UI interruption mapping packages/plugin-ag-ui/src/index.ts, packages/plugin-ag-ui/test/index.test.ts
5 Demo surfaces examples/shared/hitl-demo.ts, examples/copilotkit/src/components/chat-panel.tsx, examples/pi-tui/src/app.ts
6 Chat SDK example and workspace wiring examples/chat-sdk/*, .agents/skills/chat-sdk/SKILL.md, package.json, pnpm-workspace.yaml, pnpm-lock.yaml

@exoticknight exoticknight requested a review from kwaa May 27, 2026 07:22
@exoticknight exoticknight self-assigned this May 27, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces human-in-the-loop (HITL) tool control capabilities for Apeira agents, adding a new @apeira/plugin-hitl package, updating core runtime types to support tool interruption and private plugin state, and integrating these features into the CopilotKit and Pi-TUI examples.

Three important issues were identified in the review: a critical persistence bug in @apeira/plugin-hitl where out-of-turn approvals update a stale private state closure and fail to persist; a high-severity parsing bug in CopilotKit's parseHitlReview where multi-line JSON arguments containing newlines break the simple line-splitting logic; and a medium-severity opportunity to canonicalize JSON arguments in getToolKey to prevent identical tool calls with different key orderings from bypassing conversation-scope approvals.

Comment on lines +87 to +245
export const hitl = (options: HITLPluginOptions = {}): HITLController => {
let mode: HITLMode = options.mode ?? 'ask'
const callAllows = new Set<string>()
const pending = new Map<string, HITLPendingRequest>()
const pendingState = new Map<string, PluginPrivateStateApi<HITLPrivateState>>()
const resumeDecisions = new Map<string, HITLResumeDecision>()
const rejectedKeys = new Map<string, string>()
const runAllows = new Set<string>()

const resolveDecision = async (context: HITLPolicyContext): Promise<HITLPolicyDecision> => {
if (mode === 'off' || mode === 'allow')
return { type: 'allow' }
if (mode === 'deny')
return { type: 'deny' }

return await options.policy?.(context) ?? { type: 'ask' }
}

const createControl = (privateState: PluginPrivateStateApi<HITLPrivateState>): ToolCallControl => ({
preToolCall: async (context) => {
const key = getToolKey(context.toolCall)

if (mode === 'deny') {
return {
message: 'TOOL_HITL_REJECTED',
type: 'reject',
}
}

const rejectedMessage = rejectedKeys.get(key)
if (rejectedMessage != null) {
rejectedKeys.delete(key)
return {
message: rejectedMessage,
type: 'reject',
}
}

if (callAllows.has(key)) {
callAllows.delete(key)
return { type: 'continue' }
}

if (runAllows.has(key) || hasConversationAllow(privateState, key))
return { type: 'continue' }

const decision = await resolveDecision({
...context,
key,
metadata: metadata.get(context.tool),
})

if (decision.type === 'allow')
return { type: 'continue' }

if (decision.type === 'deny') {
return {
message: decision.message ?? 'TOOL_HITL_REJECTED',
type: 'reject',
}
}

const interruption: ToolInterruption = {
data: {
key,
metadata: metadata.get(context.tool),
},
id: `hitl_${context.toolCall.id}`,
reason: decision.message ?? 'Human review required.',
toolCall: context.toolCall,
}

pending.set(interruption.id, {
id: interruption.id,
key,
metadata: metadata.get(context.tool),
reason: interruption.reason,
toolCall: context.toolCall,
toolName: context.tool.function.name,
})
pendingState.set(interruption.id, privateState)

return {
interruption,
type: 'interrupt',
}
},
})

const approve: HITLController['approve'] = (id, scope = options.scope ?? 'call') => {
const request = pending.get(id)
if (request == null)
return false

if (scope === 'call') {
callAllows.add(request.key)
}
else if (scope === 'run') {
runAllows.add(request.key)
}
else {
const state = pendingState.get(id)
if (state == null)
return false

setConversationAllow(state, request.key)
}

resumeDecisions.set(id, { id, scope, type: 'approved' })
pending.delete(id)
pendingState.delete(id)
return true
}

const reject: HITLController['reject'] = (id, message = 'TOOL_HITL_REJECTED') => {
const request = pending.get(id)
if (request == null)
return false

rejectedKeys.set(request.key, message)
resumeDecisions.set(id, {
id,
key: request.key,
message,
type: 'rejected',
})
pendingState.delete(id)
return pending.delete(id)
}

return {
approve,
clear: () => {
pending.clear()
pendingState.clear()
callAllows.clear()
rejectedKeys.clear()
resumeDecisions.clear()
runAllows.clear()
},
getDecisionForResume: id => resumeDecisions.get(id),
pending: () => [...pending.values()],
plugin: {
name,
onTurnDone: () => {
runAllows.clear()
},
resolveToolCallControl: ({ privateState }) => {
if (privateState == null)
throw new Error('@apeira/plugin-hitl requires plugin private state support from @apeira/core.')

return createControl(privateState)
},
version,
},
reject,
setMode: nextMode => mode = nextMode,
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Critical Persistence Bug: Stale Closure on Private State

When approve(id, 'conversation') is called, it retrieves the PluginPrivateStateApi from the pendingState map and updates it. However, because approve is called asynchronously after the turn has already completed, the turn's state has already been serialized and committed to storage. Any updates to this stale privateState closure reference are lost and never persisted to the session database or localStorage.

Solution

Instead of updating the private state immediately during the out-of-turn approve call, we can store the pending conversation approvals in an in-memory pendingConversationAllows set. When the next turn starts (e.g., the resume turn), resolveToolCallControl is called with a fresh, active privateState reference. We can then commit all pending conversation approvals to this active state, ensuring they are correctly serialized and persisted when the turn finishes.

export const hitl = (options: HITLPluginOptions = {}): HITLController => {
  let mode: HITLMode = options.mode ?? 'ask'
  const callAllows = new Set<string>()
  const pending = new Map<string, HITLPendingRequest>()
  const pendingConversationAllows = new Set<string>()
  const resumeDecisions = new Map<string, HITLResumeDecision>()
  const rejectedKeys = new Map<string, string>()
  const runAllows = new Set<string>()

  const resolveDecision = async (context: HITLPolicyContext): Promise<HITLPolicyDecision> => {
    if (mode === 'off' || mode === 'allow')
      return { type: 'allow' }
    if (mode === 'deny')
      return { type: 'deny' }

    return await options.policy?.(context) ?? { type: 'ask' }
  }

  const createControl = (privateState: PluginPrivateStateApi<HITLPrivateState>): ToolCallControl => ({
    preToolCall: async (context) => {
      const key = getToolKey(context.toolCall)

      if (mode === 'deny') {
        return {
          message: 'TOOL_HITL_REJECTED',
          type: 'reject',
        }
      }

      const rejectedMessage = rejectedKeys.get(key)
      if (rejectedMessage != null) {
        rejectedKeys.delete(key)
        return {
          message: rejectedMessage,
          type: 'reject',
        }
      }

      if (callAllows.has(key)) {
        callAllows.delete(key)
        return { type: 'continue' }
      }

      if (runAllows.has(key) || hasConversationAllow(privateState, key))
        return { type: 'continue' }

      const decision = await resolveDecision({
        ...context,
        key,
        metadata: metadata.get(context.tool),
      })

      if (decision.type === 'allow')
        return { type: 'continue' }

      if (decision.type === 'deny') {
        return {
          message: decision.message ?? 'TOOL_HITL_REJECTED',
          type: 'reject',
        }
      }

      const interruption: ToolInterruption = {
        data: {
          key,
          metadata: metadata.get(context.tool),
        },
        id: `hitl_${context.toolCall.id}`,
        reason: decision.message ?? 'Human review required.',
        toolCall: context.toolCall,
      }

      pending.set(interruption.id, {
        id: interruption.id,
        key,
        metadata: metadata.get(context.tool),
        reason: interruption.reason,
        toolCall: context.toolCall,
        toolName: context.tool.function.name,
      })

      return {
        interruption,
        type: 'interrupt',
      }
    },
  })

  const approve: HITLController['approve'] = (id, scope = options.scope ?? 'call') => {
    const request = pending.get(id)
    if (request == null)
      return false

    if (scope === 'call') {
      callAllows.add(request.key)
    }
    else if (scope === 'run') {
      runAllows.add(request.key)
    }
    else {
      pendingConversationAllows.add(request.key)
    }

    resumeDecisions.set(id, { id, scope, type: 'approved' })
    pending.delete(id)
    return true
  }

  const reject: HITLController['reject'] = (id, message = 'TOOL_HITL_REJECTED') => {
    const request = pending.get(id)
    if (request == null)
      return false

    rejectedKeys.set(request.key, message)
    resumeDecisions.set(id, {
      id,
      key: request.key,
      message,
      type: 'rejected',
    })
    return pending.delete(id)
  }

  return {
    approve,
    clear: () => {
      pending.clear()
      pendingConversationAllows.clear()
      callAllows.clear()
      rejectedKeys.clear()
      resumeDecisions.clear()
      runAllows.clear()
    },
    getDecisionForResume: id => resumeDecisions.get(id),
    pending: () => [...pending.values()],
    plugin: {
      name,
      onTurnDone: () => {
        runAllows.clear()
      },
      resolveToolCallControl: ({ privateState }) => {
        if (privateState == null)
          throw new Error('@apeira/plugin-hitl requires plugin private state support from @apeira/core.')

        if (pendingConversationAllows.size > 0) {
          for (const key of pendingConversationAllows) {
            setConversationAllow(privateState, key)
          }
          pendingConversationAllows.clear()
        }

        return createControl(privateState)
      },
      version,
    },
    reject,
    setMode: nextMode => mode = nextMode,
  }
}

Comment on lines +102 to +120
const parseHitlReview = (result: string | undefined): HitlReview | undefined => {
if (result == null || !result.includes('HITL_REVIEW_REQUIRED'))
return undefined

const lines = Object.fromEntries(result.split('\n').flatMap((line) => {
const [key, ...rest] = line.split('=')
return key == null || rest.length === 0 ? [] : [[key, rest.join('=')]]
}))

if (lines.id == null)
return undefined

return {
args: lines.args ?? '{}',
id: lines.id,
reason: lines.reason ?? 'Human review required.',
tool: lines.tool ?? 'tool',
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Multi-line JSON Parsing Bug

Splitting the raw result string by \n and then by = is extremely fragile. If the args parameter contains a formatted JSON string with newlines (which is common for complex tool arguments), splitting by \n will break the JSON structure into incomplete lines, causing JSON.parse to throw an error and rendering the review card broken.

Solution

Use a regular expression with the multiline (m) and dotall (s) flags to robustly extract the values of id, tool, args, and reason even if they span multiple lines.

Suggested change
const parseHitlReview = (result: string | undefined): HitlReview | undefined => {
if (result == null || !result.includes('HITL_REVIEW_REQUIRED'))
return undefined
const lines = Object.fromEntries(result.split('\n').flatMap((line) => {
const [key, ...rest] = line.split('=')
return key == null || rest.length === 0 ? [] : [[key, rest.join('=')]]
}))
if (lines.id == null)
return undefined
return {
args: lines.args ?? '{}',
id: lines.id,
reason: lines.reason ?? 'Human review required.',
tool: lines.tool ?? 'tool',
}
}
const parseHitlReview = (result: string | undefined): HitlReview | undefined => {
if (result == null || !result.includes('HITL_REVIEW_REQUIRED'))
return undefined
const idMatch = result.match(/^id=(.*)$/m)
const toolMatch = result.match(/^tool=(.*)$/m)
const argsMatch = result.match(/^args=([\s\S]*?)(?=\nreason=|\n$|$)/m)
const reasonMatch = result.match(/^reason=([\s\S]*)$/m)
if (idMatch == null)
return undefined
return {
args: argsMatch ? argsMatch[1].trim() : '{}',
id: idMatch[1].trim(),
reason: reasonMatch ? reasonMatch[1].trim() : 'Human review required.',
tool: toolMatch ? toolMatch[1].trim() : 'tool',
}
}

Comment thread packages/plugin-hitl/src/index.ts Outdated
Comment on lines +65 to +69
const getToolKey = (toolCall: ToolCall) =>
JSON.stringify([
toolCall.function.name,
toolCall.function.arguments ?? '',
])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Improvement: Canonicalize JSON Arguments

LLMs can occasionally generate identical tool arguments with different key ordering or whitespace formatting. Since getToolKey currently stringifies the raw arguments string directly, semantically identical calls could produce different keys, bypassing existing conversation-scope approvals.

Solution

Parse the arguments JSON and recursively sort the keys before stringifying to ensure a canonical representation.

const getToolKey = (toolCall: ToolCall) => {
  let args = toolCall.function.arguments ?? ''
  try {
    const parsed = JSON.parse(args)
    const canonicalize = (obj: any): any => {
      if (Array.isArray(obj))
        return obj.map(canonicalize)
      if (obj !== null && typeof obj === 'object') {
        return Object.keys(obj).sort().reduce((acc: any, key) => {
          acc[key] = canonicalize(obj[key])
          return acc
        }, {})
      }
      return obj
    }
    args = JSON.stringify(canonicalize(parsed))
  }
  catch {}
  return JSON.stringify([
    toolCall.function.name,
    args,
  ])
}

- Introduced @apeira/plugin-hitl for managing tool interruptions and approvals.
- Updated core types to support tool interruption events and private state management.
- Enhanced agent runtime to handle plugin-specific private states during turn processing.
- Modified session store to maintain plugin private state across context updates.
- Implemented tests for HITL plugin functionality, including approval and rejection scenarios.
- Updated package dependencies and workspace configuration to include new HITL plugin.
- Introduced HTML and Markdown versions of the Plugin Common Tools roadmap.
- Documented phases of development, including goals, status, scope, tasks, and acceptance criteria.
- Established a status legend for tracking progress on each phase.
- Highlighted the need for discussions on approval policies affecting multiple phases.
@exoticknight exoticknight changed the title feat(hitl): add human-in-the-loop tool control plugin feat(plugin-hitl): add human-in-the-loop tool approval flow May 27, 2026
@exoticknight
Copy link
Copy Markdown
Member Author

/gemini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant